adhesiomeR: a tool for Escherichia coli adhesin classification and analysis

Adhesins are crucial factors in the virulence of bacterial pathogens such as Escherichia coli. However, to date no resources have been dedicated to the detailed analysis of E. coli adhesins. Here, we provide adhesiomeR software that enables characterization of the complete adhesin repertoire, termed the adhesiome. AdhesiomeR incorporates the most comprehensive database of E. coli adhesins and facilitates an extensive analysis of adhesiome. We demonstrate that adhesiomeR achieves 98% accuracy when compared with experimental analyses. Based on analysis of 15,000 E. coli genomes, we define novel adhesiome profiles and clusters, providing a nomenclature for a unified comparison of E. coli adhesiomes. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-024-10525-6.


Analysis on gene level
First, you need to run blast search on the pangenome.You can specify number of threads to use with n_threads argument: library(adhesiomeR) blast_results <-get_blast_res("pan_genome_reference.fa",n_threads = 8) The next step is to extend the results obtained for the pangenome to individual genomes based on the gene presence/absence matrix: blast_results_full <-pangenome_to_genome(blast_results, "gene_presence_absence.csv") Now, we can use the extended results to obtain adhesin gene presence: presence_df <-get_presence_table_strict(blast_res = blast_results_full, n_threads = 8) If you wish to see only genes that were found in at least one file, you can set add_missing argument to FALSE.Note that by default the results include all genes from the adhesiomeR database.
presence_df2 <-get_presence_table_strict(blast_res = blast_results_full, n_threads = 8, add_missing = FALSE) Please note that due to the nature of the pangenome, it is not possible to determine adhesin gene copy number using this approach.It is also not recommended to use profile and cluster assignment as they have been developed to be used mainly with individual genome assemblies.

Plotting results on gene level
You can easily plot the presence/absence of adhesin genes.For simplicity (and due to the size of the full plot), we will plot only a few systems: type 1, Auf, Yhc, Pix, UCL fimbriae, ehaB, cah and paa.By default (without specifying systems argument), genes from all systems will be plotted.Note that if you analyse more than one genome, the results on a heatmap are clustered for more clear visualisation.

Analysis on system level
Get system information from gene presence.A system is considered as present if all of its genes are found.The next step is to extend the results obtained for the pangenome to individual genomes based on the gene presence/absence matrix: blast_results_full <-pangenome_to_genome(blast_results, "gene_presence_absence.csv") The next step is to get gene presence information from extended blast results.Here, you can set the thresholds for considering gene as present or absent.By default, adhesiomeR uses 75% thresholds for both.
In the resulting table, 1 indicates gene presence and 0 its absence.
presence_rel <-get_presence_table_relaxed(blast_res = blast_results_full, n_threads = 8) You can modify the default thresholds using identity and coverage arguments: presence_rel2 <-get_presence_table_relaxed(blast_res = blast_results_full, identity = 90, coverage = 90, n_threads = 8) If you wish to see only genes that were found in at least one file, you can set add_missing argument to FALSE.Note that by default the results include all genes from the adhesiomeR database.

Plotting results on gene level
You can easily plot the presence/absence of adhesin genes.For simplicity (and due to the size of the full plot), we will plot only a few systems: type 1, Auf, Yhc, Pix, UCL fimbriae, ehaB, cah and paa.By default (without specifying systems argument), genes from all systems will be plotted.Note that if you analyse more than one genome, the results on a heatmap are clustered for more clear visualisation.